Model Selection

FP8 quantization

# FP8 quantization

Bielik 4.5B V3.0 Instruct FP8 Dynamic

This model is the FP8 quantized version of Bielik-4.5B-v3.0-Instruct, utilizing AutoFP8 technology to quantize weights and activations into FP8 data type, reducing approximately 50% of disk space and GPU memory requirements.

Large Language Model Other

Bielik 1.5B V3.0 Instruct FP8 Dynamic

This is an FP8 dynamic quantization version based on the Bielik-1.5B-v3.0-Instruct model, adapted for vLLM or SGLang inference frameworks. It uses AutoFP8 quantization technology to reduce parameter bytes from 16-bit to 8-bit, significantly lowering disk space and GPU VRAM requirements.

Large Language Model Other

Qwen3 30B A3B FP8 Dynamic

Qwen3-30B-A3B-FP8-dynamic is an FP8 quantized version of the Qwen3-30B-A3B model, significantly reducing memory requirements and computational costs while maintaining the high accuracy of the original model.

Large Language Model

Qwen3 8B FP8 Dynamic

Qwen3-8B-FP8-dynamic is an optimized version of the Qwen3-8B model through FP8 quantization, significantly reducing GPU memory requirements and disk space usage while maintaining the original model's performance.

Large Language Model

Qwen3 32B FP8 Dynamic

An efficient language model based on Qwen3-32B with FP8 dynamic quantization, significantly reducing memory requirements and improving computational efficiency

Large Language Model

Mistral Small 3.1 24B Instruct 2503 FP8 Dynamic

This is a 24B-parameter conditional generation model based on the Mistral3 architecture, optimized with FP8 dynamic quantization, suitable for multilingual text generation and visual understanding tasks.

Safetensors Supports Multiple Languages

QwQ-32B-FP8 is the FP8 quantized version of the QwQ-32B model, maintaining nearly the same accuracy as the BF16 version while supporting faster inference speed.

Large Language Model

Qwq 32B FP8 Dynamic

FP8 quantized version of QwQ-32B, reducing storage and memory requirements by 50% through dynamic quantization while maintaining 99.75% of the original model accuracy

Large Language Model

Qwq 32B FP8 Dynamic

FP8 quantized version of QwQ-32B, reducing storage and memory requirements by 50% through dynamic quantization while maintaining 99.75% of the original model accuracy

Large Language Model

Flex.1 Alpha Fp8

Flex.1-alpha-Fp8 is the safetensors format version of the Flex.1-alpha model with float8_e4m3fn weights, suitable for text-to-image generation tasks.

Text-to-Image English

SD3.5 Large Fp8

FP8 quantized version of Stable Diffusion 3.5 Large for text-to-image generation tasks.

Image Generation

Llama 3.2 1B Instruct FP8

FP8 quantized version of Llama-3.2-1B-Instruct, suitable for multilingual business and research applications, with performance close to the original model.

Large Language Model

Safetensors Supports Multiple Languages

Llama 3.2 3B Instruct FP8 Dynamic

FP8 quantized version of Llama-3.2-3B-Instruct, suitable for multilingual commercial and research purposes, particularly ideal for assistant-like chat scenarios.

Large Language Model

Safetensors Supports Multiple Languages

Meta Llama 3.1 70B FP8

FP8 quantized version of Meta-Llama-3.1-70B, suitable for multilingual business and research applications, with both weights and activations quantized to FP8 format, reducing storage and memory requirements by approximately 50%.

Large Language Model

Transformers Supports Multiple Languages

Meta Llama 3.1 8B FP8

FP8 quantized version of Meta-Llama-3.1-8B, suitable for multilingual business and research applications.

Large Language Model

Transformers Supports Multiple Languages

Meta Llama 3.1 70B Instruct FP8

FP8 quantized version of Meta-Llama-3.1-70B-Instruct, suitable for multilingual commercial and research purposes, especially ideal for assistant-like chat scenarios.

Large Language Model

Transformers Supports Multiple Languages

Meta Llama 3.1 8B Instruct FP8

FP8 quantized version of Meta-Llama-3.1-8B-Instruct, suitable for multilingual business and research applications, specially optimized for assistant-like chat scenarios.

Large Language Model

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase